home *** CD-ROM | disk | FTP | other *** search
- (c) Copyright 1989-1999 Amiga, Inc. All rights reserved.
- The information contained herein is subject to change without notice, and
- is provided "as is" without warranty of any kind, either expressed or implied.
- The entire risk as to the use of this information is assumed by the user.
-
-
-
-
- Introduction to 1.3 IEEE
- Double Precision Libraries
-
- by Dale Luck
-
-
-
-
- The basic double precision IEEE library has been rewritten for V1.3.
- The new library is up to 4 times faster than the old one that came with
- V1.2. There were also several bugs fixed. And the routines now produce
- slightly more accurate results. I've listed some benchmarks comparing
- the two versions of the libraries at the end of this article.
-
- Besides the faster software emulation of floating point, the new IEEE math
- library recognizes and uses the 68020/68881 processor combination and
- will use the special floating point instructions available. Also, if
- an auto-configured math resource is available, it will use that as well.
- Typically, this resource would point to the base of a 68881 designed as
- a 16 bit IO port. But it could be another device as well.
-
- With the new library, you also have the ability to programmatically trap
- math errors such as overflow and divide by zero. Your program can now
- ignore them or take suitable action without visiting the GURU.
-
- In addition to a new version of the basic mathieeedoubbas.library, a
- second library supporting transcendental functions has been added. The
- name of the new library is mathieedoubtrans.library for IEEE double
- precision transcendental library. It supports the same functions as the
- transcendantal library for the Motorola fast floating point, such as sine,
- cosine, square root, etc. This library also can identify and use the
- 68020/68881 combination or other math resources. And it has a very fast
- software square root routine.
-
-
- When Should You Use These Libraries?
-
- These libraries have been benchmarked as the fastest IEEE double precision
- libraries available on the Amiga as well as outperforming almost all other
- software math libraries in the Amiga class personal workstation market.
-
- If you need the precision of IEEE double, and wish to have a transparent
- improvement in speed when your programs run on machines with math
- coprocessors, then you should use these libraries. All the decision
- making is done by the library when it is first initialized and it will
- use the fastest available resources to do your math. You only need
- one program to support a standard Amiga, a 68020/68881 Amiga, or a
- external math coprocessor Amiga. It works automatically.
-
-
-
-
-
- When Should You Avoid These Libraries?
-
- If you don't need double precision, use the Motorola fast floating point
- routines. As you can see from the benchmarks, the Motorola routines are
- still quite a bit faster.
-
- If you want your math to be the fastest possible, you will want to use
- the new instructions available on the 68020/68881 directly in your code.
- In that case, you would not need the IEEE libraries. However this would
- prevent your code from running on conventional 68000 based Amigas unless
- you supply different versions of your code for each configuration.
-
-
-
- Floating Point Formats
-
- Here's a chart comparing the various methods of representing floating
- point numbers used by Amiga system software. The IEEE double precision
- libraries operate on 64 bit quantities. The Motorola FFP libraries use
- 32 bits.
-
- Note that there is a "hidden" bit in the fraction part of IEEE numbers.
- Since all numbers are normalized, the leading 1 is dropped off.
-
-
-
- Motorola Single Double
- Field Size (bits) FFP IEEE IEEE
-
- Sign 1 1 1
- Exponent 7 7 11
- Fraction 24 23+1 52+1
-
- Total 32 32 64
-
-
-
-
- Minimum (+) number 5.4e-20 1.3e-38 2.2e-308
-
- Largest (+) number 9.2e+19 3.4e+38 1.8e+307
-
- Minimum (+) number n/a 1.4e-45 4.9e-324
- (denormalized)
-
- Denormalized means reduced in precision so that numbers closer
- to zero can be represented.
-
-
-
-
-
-
-
-
-
- Floating Point Representation
-
-
- +--------+--------+--------+--------+
- |ffffffff|ffffffff|ffffffff|Seeeeeee| Motorola FFP
- +--------+--------+--------+--------+
-
- +--------+--------+--------+--------+
- |Seeeeeee|ffffffff|ffffffff|ffffffff| IEEE Single
- +--------+--------+--------+--------+
-
-
- IEEE Double
-
- +--------+--------+--------+--------+--------+--------+--------+--------+
- |Seeeeeee|eeeeffff|ffffffff|ffffffff|ffffffff|ffffffff|ffffffff|ffffffff|
- +--------+--------+--------+--------+--------+--------+--------+--------+
-
- S = Sign bit
- f = fraction bits
- e = exponent bits
-
-
- The scheme used in IEEE floating point representation includes a few
- "special" numbers. Certain patterns of bits are used to represent
- exceptions:
-
- o NAN 'Not A Number' (result of 0/0)
-
- o INF 'Infinity' (result of 1/0)
-
- There are other assigned patterns in addition to these two.
-
-
-
-
-
- Using the Libraries
-
- The new IEEE libraries should be placed in the :libs directory. Use
- the mathieeedoubbas.library to replace the old library of that same
- name. The mathieeedoubtrans.library is an all new addition.
-
- Code that calls routines in these libraries will have to be linked
- to the new .lib files which also have awkward names. They are
- mathieeedoubbas_lib.lib and mathieeedoubtrans_lib.lib. And there
- is a new .fd file for the transcendental functions.
-
- Using the IEEE routines is straight forward - they are a standard
- library. Simply open the library, use the routines and close the
- library when you are done. For example, to use the Sine routine:
-
-
-
-
-
- /* IEEE Sine Routine */
- /* Compile under Lattice 4.0 by linking with c.o + */
- /* mathieeedoubbas_lib.lib + mathieeedoubtrans_lib.lib */
- /* + lcm.lib + lc.lib + amiga.lib */
-
- double IEEEDPSin();
- extern int MathIeeeDoubBasBase;
- int MathIeeeDoubTransBase;
-
- void
- main()
- {
-
- double x=0;
-
- MathIeeeDoubBasBase=OpenLibrary("mathieeedoubbas.library",0);
- if(MathIeeeDoubBasBase==0) exit(0);
-
- MathIeeeDoubTransBase=OpenLibrary("mathieeedoubtrans.library",0);
- if(MathIeeeDoubTransBase==0)
- {
- CloseLibrary(MathIeeeDoubBasBase);
- exit(0);
- }
-
- x=IEEEDPSin( (double) 60 );
- printf("sin 60 = %e\n",x);
-
- CloseLibrary(MathIeeeDoubBasBase);
- CloseLibrary(MathIeeeDoubTransBase);
-
- }
-
-
-
- Hardware Developer Information.
-
- To make use of CBM's standard peripheral support for 68881 you must design
- your peripheral to autoconfig. Your autoconfig software must create a
- resource and add it to the resource list. The name of this resource is
- "MathIEEE.resource". The IEEE library will attempt to open this resource.
- If it finds it, it will extract the BaseAddr pointer and copy it into its
- library structure. If the BaseAddr pointer is non-null it will use a
- different list of routine entry points when the IEEE library is initialized.
-
- After the IEEE library is initialized, the library again checks the resource
- for alternate function bits in Flags of the resource. The Basic library only
- checks the DblBasAlt bit, and the transcendental library only checks the
- DblTransAlt bit. If they are set, the library routine will call the function
- whose address is in the corresponding Init field. The arguments passed are
- a6=sysbase, a1=resource and a2=mathlibrary.
-
- If your device is not a 68881 then you may need to use this. There are
- separate bits for different library capabilities in case your math resource
- is only able to handle a limited set of functions. This will let you tie a
- math processor in that may only provide addition, subtraction, multiplication and
- and division functions. The rest of software will use it transparently by
- calling your alternate routines.
-
- Amiga does not provide for arbitrating a math accelerator in a multitasking
- environment. Therefore, you must provide your own support for this when your
- device autoconfigs. The only exception is the 68020/68881 combination where
- support for that has been standard since V1.2. Arbitration usually involves
- saving and restoring the state of you hardware device between task switches.
-
- We recommend that you look at the tc_Switch and tc_Launch vectors in the task
- data structure. These are called each time control transfers from one task to
- another. Remember not to assume that you are the only process needing to use
- those vectors.
-
- The resource data structure is as follows:
-
- STRUCTURE MathIEEE,LN_SIZE
- UWORD MathIEEE_Flags
- ULONG MathIEEE_BaseAddr ; for standard 68881 support
- ULONG MathIEEE_DblBasInit ; something else besides 68881
- ULONG MathIEEE_DblTransInit ; something else besides 68881
- ULONG MathIEEE_SnglBasInit ; something else besides 68881
- ULONG MathIEEE_SnglTransInit ; something else besides 68881
- LABEL MathIEEE_sizeof
- *
- * Bits for MathIEEE_flags. All unassigned bits must be 0
- *
- BITDEF MathIEEE,DblBasAlt,0 ; alternate Basic library
- BITDEF MathIEEE,DblTransAlt,1 ; alternate Trans library
- BITDEF MathIEEE,SnglBasAlt,2 ; alternate Basic library
- BITDEF MathIEEE,SnglTransAlt,3 ; alternate Trans library
-
-
- The MathIEEE resource structure may grow in the future. Extensions will be
- added as Amiga, Inc. adds new standards such as 80 bit extended format.
-
- The 'Init' entries in the math resource structure are only used if the
- corresponding Bit is set in the Flags field. So if you are just a 68881,
- you do not need the Init entries. Make sure you have cleared the Flags field.
- This should allow us to add Extended Precision later. For Init users, make
- sure you add yourself into the Open/Close/Expunge vectors for this library.
-
-
-
-
- The library structure that is used is tentatively laid out as shown below.
- I say tentatively because the name of the entries may change yet. The order
- of entries, their usage and size will not change. Naturally we may add new
- fields to the end.
-
-
-
- STRUCTURE MI,LIB_SIZE ; Standard library node
- UBYTE io8_Flags ; is this 68881?
- UBYTE io8_pad ; line up to next 32bit boundary
- ULONG io8_68881 ; ptr to io68881 base
- ULONG io8_SysLib ; ptr to SysBase
- ULONG io8_SegList ; ptr to this SegList
- ULONG io8_Resource ; ptr to mathIEEE.resource
- ULONG io8_opentask ; called when task opens
- ULONG io8_closetask ; called when task closes
- LABEL MI_SIZE
-
-
- Of particular interest to hardware developers are the opentask and closetask
- entry points. These functions will be called when a task calls OpenLibrary
- and CloseLibrary. This will give the vendor the opportunity to set up any
- per task initialization necessary. The Amiga library presently sets them up
- as NOPs in the case of straight emulation. It puts the 68881 initialization
- code in there for the 68020/68881 as well as the peripheral 68881. That
- initialization code currently sets up rounding modes and interrupt requests.
-
- If you need to override the defaults, you will have to set the appropriate
- Alt bits in the Resource structure and overwrite the opentask/closetask
- fields when your AltInit function is called. The OpenLibrary routine checks
- the return value of opentask for errors. If a nonzero is in d0.l then
- OpenLibrary will return 0 to the task trying to OpenLibrary.
-
-
- On the 68020/68881 some new exceptions are generated. Unfortunately the
- V1.2 operating system does not properly initialize these. For users of the
- new ramkick/A2024 system, the fixes have been added to the exec.library.
- For the rest we provide a program to run during your startup sequence to
- initialize the vectors and redirect processing back to exec when the new
- exceptions occur. This is only necessary on 68020/68881 systems.
-
-
-
- Benchmarks
-
- This section contains some benchmarks comparing the performance of the
- various Amiga math libraries. Use these as a guide when selecting the
- math routines to be used for your application.
-
- All these benchmarks show the reults when compiling under Greenhill's C.
- The results you get with another compiler will vary.
-
-
-
-
-
-
-
-
-
-
-
-
-
- How does V1.3 stack up to V1.2?
- A Comparison of Software
-
- V1.2 V1.3 V1.2
- IEEE IEEE MathFFP
- Float
- 10000 (secs) 92.14 45.22 17.64
- 256000 (secs) 580.58 282.52 136.78
-
- Calcpi
- (kflops/sec) 2.07 4.93 11.14
- PI error -5.5e-14 -1.4e-11 6.1e-5
-
- Whetstone
- (kwhets/sec) 12 24 78
-
- Savage
- (secs) N/A 470 98.2
-
- System tested: A1000, 512k chip memory, 1 external floppy
-
-
-
- Transparent Increase in Speed
-
- V1.3/000 000/881 020/881
- Float
- 10000 (secs) 45.22 19.18 13.46
- 256000 (secs) 282.52 179.98 122.46
-
- BCalcpi
- (kflops/sec) 4.93 7.89 11.78
- PI error -1.39e-11 -2.78e-11 -2.78e-11
-
- Whetstone
- (kwhets/sec) 24 81 124
-
- Savage
- (secs) 470 20.4 15.2
- error -6.9e-7 -5.6e-7 -5.6e-7
-
-
-
- Systems tested:
-
- V1.3/000 was an A1000 with 512k.
- 000/881 was an A1000 with 512k plus 2M and Microbotic's "881 Starmath
- 020/881 was an A2000 with CSA's 68020/68881, 2M memory and a 2090a
-
- Penultimate Speed Tests:
- Comparison of Speed Using
- Inline F instructions
-
- V1.3/000 020/881
- Float
- 10000 (secs) 45.22 0.26*
- 256000 (secs) 282.52 15.86
-
- Calcpi
- (kflops/sec) 4.93 81.3
-
- Whetstone
- (kwhets/sec) 24 459
-
- Savage
- (secs) 470 4.6
-
- Systems tested:
-
- V1.3/000 was an A1000 with 512k and 1 external floppy.
- 020/881 was an A2000 with CSA's 68020/881, 2M memory and a 2090a.
-
- Note: Under this test, the 020/881 test code will not run on a
- standard 68000 based system.
-
- * The Greenhill compiler may have optimized this benchmark to nothing.
-
-
- Penultimate Speed Tests, II:
- Inline Results With
- Fast 32-Bit Memory
- Inline Inline
- 020 020/881 030/882 020/881 030/882
- Float
- 10000 (secs) 25.6 6.08 5.16 0.24* 0.18*
- 256000 (secs) 168.74 54.08 47.52 15.28 13.16
-
- Calcpi
- (kflops/sec) 8.44 25.29 28.8 90.09 114.42
-
- Whetstone
- (kwhets/sec) 39 263 291 673 889
-
- Savage
- (secs) 320.8 8.4 7.6 4.46 3.98
-
- Systems tested:
-
- 020 was an A2000 with CSA's 020 board running at 14 MHz.
- 020/881 was an A2000 with CSA's 020/881 board running at 14 MHz.
- 030/882 was an A2000 with CSA's 030/882 board running at 14/16 MHz.
-
- * The greenhills compiler may have optimized this benchmark to nothing.
-
-
-
-
-
-